883 research outputs found
Efficient Parallel Translating Embedding For Knowledge Graphs
Knowledge graph embedding aims to embed entities and relations of knowledge
graphs into low-dimensional vector spaces. Translating embedding methods regard
relations as the translation from head entities to tail entities, which achieve
the state-of-the-art results among knowledge graph embedding methods. However,
a major limitation of these methods is the time consuming training process,
which may take several days or even weeks for large knowledge graphs, and
result in great difficulty in practical applications. In this paper, we propose
an efficient parallel framework for translating embedding methods, called
ParTrans-X, which enables the methods to be paralleled without locks by
utilizing the distinguished structures of knowledge graphs. Experiments on two
datasets with three typical translating embedding methods, i.e., TransE [3],
TransH [17], and a more efficient variant TransE- AdaGrad [10] validate that
ParTrans-X can speed up the training process by more than an order of
magnitude.Comment: WI 2017: 460-46
Compressed sensing and robust recovery of low rank matrices
In this paper, we focus on compressed sensing and recovery schemes for low-rank matrices, asking under what conditions a low-rank matrix can be sensed and recovered from incomplete, inaccurate, and noisy observations. We consider three schemes, one based on a certain Restricted Isometry Property and two based on directly sensing the row and column space of the matrix. We study their properties in terms of exact recovery in the ideal case, and robustness issues for approximately low-rank matrices and for noisy measurements
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
A Cost-based Optimizer for Gradient Descent Optimization
As the use of machine learning (ML) permeates into diverse application
domains, there is an urgent need to support a declarative framework for ML.
Ideally, a user will specify an ML task in a high-level and easy-to-use
language and the framework will invoke the appropriate algorithms and system
configurations to execute it. An important observation towards designing such a
framework is that many ML tasks can be expressed as mathematical optimization
problems, which take a specific form. Furthermore, these optimization problems
can be efficiently solved using variations of the gradient descent (GD)
algorithm. Thus, to decouple a user specification of an ML task from its
execution, a key component is a GD optimizer. We propose a cost-based GD
optimizer that selects the best GD plan for a given ML task. To build our
optimizer, we introduce a set of abstract operators for expressing GD
algorithms and propose a novel approach to estimate the number of iterations a
GD algorithm requires to converge. Extensive experiments on real and synthetic
datasets show that our optimizer not only chooses the best GD plan but also
allows for optimizations that achieve orders of magnitude performance speed-up.Comment: Accepted at SIGMOD 201
High-sensitivity microfluidic calorimeters for biological and chemical applications
High-sensitivity microfluidic calorimeters raise the prospect of achieving high-throughput biochemical measurements with minimal sample consumption. However, it has been challenging to realize microchip-based calorimeters possessing both high sensitivity and precise sample-manipulation capabilities. Here, we report chip-based microfluidic calorimeters capable of characterizing the heat of reaction of 3.5-nL samples with 4.2-nW resolution. Our approach, based on a combination of hard- and soft-polymer microfluidics, provides both exceptional thermal response and the physical strength necessary to construct high-sensitivity calorimeters that can be scaled to automated, highly multiplexed array architectures. Polydimethylsiloxane microfluidic valves and pumps are interfaced to parylene channels and reaction chambers to automate the injection of analyte at 1 nL and below. We attained excellent thermal resolution via on-chip vacuum encapsulation, which provides unprecedented thermal isolation of the minute microfluidic reaction chambers. We demonstrate performance of these calorimeters by resolving measurements of the heat of reaction of urea hydrolysis and the enthalpy of mixing of water with methanol. The device structure can be adapted easily to enable a wide variety of other standard calorimeter operations; one example, a flow calorimeter, is described
Guaranteed clustering and biclustering via semidefinite programming
Identifying clusters of similar objects in data plays a significant role in a
wide range of applications. As a model problem for clustering, we consider the
densest k-disjoint-clique problem, whose goal is to identify the collection of
k disjoint cliques of a given weighted complete graph maximizing the sum of the
densities of the complete subgraphs induced by these cliques. In this paper, we
establish conditions ensuring exact recovery of the densest k cliques of a
given graph from the optimal solution of a particular semidefinite program. In
particular, the semidefinite relaxation is exact for input graphs corresponding
to data consisting of k large, distinct clusters and a smaller number of
outliers. This approach also yields a semidefinite relaxation for the
biclustering problem with similar recovery guarantees. Given a set of objects
and a set of features exhibited by these objects, biclustering seeks to
simultaneously group the objects and features according to their expression
levels. This problem may be posed as partitioning the nodes of a weighted
bipartite complete graph such that the sum of the densities of the resulting
bipartite complete subgraphs is maximized. As in our analysis of the densest
k-disjoint-clique problem, we show that the correct partition of the objects
and features can be recovered from the optimal solution of a semidefinite
program in the case that the given data consists of several disjoint sets of
objects exhibiting similar features. Empirical evidence from numerical
experiments supporting these theoretical guarantees is also provided
Asynchronous Training of Word Embeddings for Large Text Corpora
Word embeddings are a powerful approach for analyzing language and have been
widely popular in numerous tasks in information retrieval and text mining.
Training embeddings over huge corpora is computationally expensive because the
input is typically sequentially processed and parameters are synchronously
updated. Distributed architectures for asynchronous training that have been
proposed either focus on scaling vocabulary sizes and dimensionality or suffer
from expensive synchronization latencies.
In this paper, we propose a scalable approach to train word embeddings by
partitioning the input space instead in order to scale to massive text corpora
while not sacrificing the performance of the embeddings. Our training procedure
does not involve any parameter synchronization except a final sub-model merge
phase that typically executes in a few minutes. Our distributed training scales
seamlessly to large corpus sizes and we get comparable and sometimes even up to
45% performance improvement in a variety of NLP benchmarks using models trained
by our distributed procedure which requires of the time taken by the
baseline approach. Finally we also show that we are robust to missing words in
sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
Recognizing the need for personalization of haemophilia patient‐reported outcomes in the prophylaxis era
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134854/1/hae13066.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/134854/2/hae13066_am.pd
Insulator-to-metal transition in sulfur-doped silicon
We observe an insulator-to-metal (I-M) transition in crystalline silicon
doped with sulfur to non- equilibrium concentrations using ion implantation
followed by pulsed laser melting and rapid resolidification. This I-M
transition is due to a dopant known to produce only deep levels at equilibrium
concentrations. Temperature-dependent conductivity and Hall effect measurements
for temperatures T > 1.7 K both indicate that a transition from insulating to
metallic conduction occurs at a sulfur concentration between 1.8 and 4.3 x
10^20 cm-3. Conduction in insulating samples is consistent with variable range
hopping with a Coulomb gap. The capacity for deep states to effect metallic
conduction by delocalization is the only known route to bulk intermediate band
photovoltaics in silicon.Comment: Submission formatting; 4 journal pages equivalen
- …